Apparatus and method for network analysis

ABSTRACT

A system for, and method of, extracting information from multiple sessions and in accordance with disparate protocols, and transforming the same into a common language. Packets are collected by packet collectors distributed throughout a network and those packets, and/or metadata relating to those packets, are passed to an aggregator, which is made available via an application program interface to users/applications.

This application is a continuation of application Ser. No. 10/133,392,filed Apr. 29, 2002, which claims the benefit of U.S. ProvisionalApplication Ser. No. 60/286,966, filed Apr. 30, 2001, both of which areincorporated herein by reference in their entireties.

The invention was made with Government support under a classifiedcontract awarded by the U.S. Government. The Government may have certainrights in the invention.

BACKGROUND

1. Field of the Invention

The present invention generally relates to the field of networkanalysis. More particularly, the present invention relates to methodsand apparatus for parsing information in network protocols into a commonlanguage for analysis.

2. Background of the Invention

Not long ago, people communicated important information between oneanother through the physical delivery of paper. Delivering documents inthis way to convey important information once dominated business but hassince been largely displaced by electronic delivery and communication.Whether it is by email or otherwise, today people send many sensitiveand important documents and information electronically.

The movement to electronic distribution of information has increasedbusinesses' awareness of security issues. Electronic files are easy tocopy and transmit out of an unwitting organization. Potential saboteurslike hackers, for example, can access, steal, alter, and/or destroyimportant information.

This increased awareness in security issues concerning electroniccommunications led companies to begin to monitor data transfers betweenentities, such as people, computers, and resources. The enormous volumeof data generated by communications between entities (e.g., peopleviewing websites, people sending emails to one another, peopletransferring files to one another, and many other communications) madeit difficult for a company to monitor all of the communicationinformation. To help alleviate this problem, companies developed systemsthat analyze communications to determine which communications are likelyillegal or otherwise prohibited by the companies' business rules.

Computers on a network send information to each other as part of acommunication session. The data for this communication session is brokenup by the network and transferred from a source address to a destinationaddress. This is analogous to the mail postal system, which uses zipcodes, addresses, and known routes of travel to ship packages. If onewere to ship the entire contents of a home to another location, it wouldnot be cost effective or an efficient use of resources to packageeverything into one container for shipping. Instead, smaller containerswould be used for the transportation and assembled after delivery.Computer networks work in a similar fashion by taking data and packagingit into smaller pieces for transmitting across a network. Each of thesepackets is governed by a set of rules that defines its structure and theservice it provides. For example, the World Wide Web has a standardprotocol defined for it, the Hyper Text Transport Protocol (HTTP). Thisstandard protocol dictates how packets are constructed and how data ispresented to web servers and how these web servers return data to theclient web browsers.

Any application that transmits data over a computer network uses one ormore protocols. There are many layers of protocols in use betweencomputers on a network. Not only do web browsers have protocols they useto communicate, but the network has underlying protocols as well. Thistechnique is called data encapsulation. For example, when you make arequest to a web site, your data request is encapsulated by the HTTPprotocol used by your browser. The data is then encapsulated by thecomputer's network stack before it is put onto the network. The networkmay encapsulate the packet into another packet using another protocolfor transmission to another network. Each layer of the protocol helpsprovide routing information to get the packets to their targetdestination.

In order for a company to analyze or monitor its users' trafficeffectively, companies typically use tool(s) to: “sniff” or capture thepackets traversing the network of interest; understand the protocolbeing used in the communication; analyze the data packets used in thecommunication; and draw conclusions based on information gained fromthis analysis. Conventional tools for analyzing network traffic includeprotocol analyzers, intrusion detection systems, application monitors,log consolidators, and combinations of these tools.

A conventional protocol analyzer can provide insight into the type ofprotocols being used on a network. The analysis tools within thisanalyzer enable the analyzer to decode protocols and examine individualpackets. By examining individual packets, conventional protocolanalyzers can determine where the packet came from, where it is going,and the data that it is carrying. It would be impossible to look atevery packet on a network by hand to see if security concerns exist,therefore, more specialized analysis products were created.

One example of a more specialized but conventional analysis tool is anIntrusion Detection System (IDS), which validates network packets basedon a series of known signatures. If the IDS determines that certainpackets are invalid or suspicious, the IDS will alert the company.Company employees, in some cases using additional analysis tools, mustthen analyze most of these alerts. This analysis can require extensivemanpower and resources.

Another example of a more specialized but conventional analysis tool isan application monitor. Application monitors focus on specificapplication layer protocols to decide if illegal or suspicious activityis being performed. This conventional application monitor may focus, forexample, on the Hyper Text Transfer Protocol (HTTP) to monitor employeeaccesses to websites. When this monitor is used, such as when anemployee visits a website, the company can monitor the packetstransmitted and received between the employee's computer and the webserver. These packets can be analyzed by parsing the HTTP protocol todetermine the website's hostname, the name of the file requested, andthe associated content that was retrieved. Thus, this HTTP analyzercould be used to decide if an employee is visiting inappropriate websites and alert the company of this activity. This type of analysis toolmonitors the actions of web browsers, but falls short for other types ofcommunications.

Another conventional application monitor can monitor the Simple MailTransport Protocol (SMTP). This system could be used record and tracke-mails sent outside of the company to ensure employees were not sendingtrade secrets or intellectual property owned by the company. It couldalso ensure e-mails entering into the corporation did not containmalicious attachments or viruses. Employees could, however, use othermeans of communication such as instant messaging, chat rooms, andwebsite-based e-mail systems. Because this application monitor onlymonitors SMTP communications, companies must also use many othersecurity and analytical tools to monitor network activity.

Another example of a more specialized but conventional analysis tool isa log consolidator system (LCS). The LCS processes log-based output fromnetwork applications or devices. These data inputs can include firewalllogs, router logs, application logs such as web server or mail serverlogs, computer system logs, and/or IDS alerts. Typically, a specific LCSanalysis tool is required for each different log format, which meansmultiple analysis systems are needed for each different type of log fileformat.

While these and other conventional network analysis systems analyzecommunications of a particular protocol or format, they fail to analyzea broad breadth of protocols and formats. Thus, a company wishing toensure security of its network currently must purchase and maintainmultiple network analysis systems. Further, with each new protocol orprotocol change, companies must create, rewrite, upgrade, or repurchaseat least one of their systems. The conventional method of using apatch-work of multiple analyzers is expensive and complex to maintain.

In addition, because of the many ways to communicate over a network andthe many different analysis tools needed to perform network forensics,the conventional method makes it difficult to answer even simplequestions such as “What is happening on my network?,” “Who is talking towhom?,” and “What resources are being accessed?” It is difficult becausethere is no limit as to which applications one can use. Each applicationintroduced onto a network brings new protocols and new analytical toolsto audit those applications. For example, there are many ways to send afile to another person using a network: E-mailing the document as anattachment using the SMTP protocol; transmitting the file using anInstant Messenger like MSN, AOL IM™, or Yahoo™IM; uploading the file toa shared file server using the FTP protocol; web sharing the documentusing the HTTP protocol; or uploading the file directly using anintranet protocol like SMB or CIFS. All of these protocols areimplemented differently and special analysis tools are required tointerpret them; a complex and expensive system.

The conventional analysis systems also fail because they requiretraining personnel to use the numerous analysis tools needed toinvestigate network communications having many different protocols. Thistraining is expensive. In addition, network analysis continues to becomeincreasingly difficult due to the large number of new applications andprotocols being introduced every year.

Other systems found outside of computer networks have similar issuesregarding analysis. These issues can be found in “badge swipe” systems,used to monitor the movement of persons in and out of a building, intraffic monitoring systems that monitor cars passing through radiofrequency identification (RFID) toll points, property monitoring systemsthat monitor video cameras and various motion sensors or other sensors,and in other contexts involving the collection and analysis of data ofvarying protocols or languages. Specific analytical tools must bedeveloped for each collection system making it difficult tocross-correlate events and perform analysis.

SUMMARY OF THE INVENTION

To address the foregoing problems and others associated with monitoringlarge volumes of data in numerous protocols, the present invention isdirected to conversion of network traffic containing multiple protocolsinto a common language suited for analysis. In addition, because data inmultiple, disparate protocols may be described in a common language, aunique analysis logic or a protocol-specific analyzer will not be neededfor every protocol, thereby significantly reducing the complexityassociated with conventional systems.

In one aspect of the invention, the common language of the presentinvention permits any network transaction, regardless of the particularapplication or protocol, to be described.

In another aspect of the invention, common language descriptions arestored as “metadata,” which describes the communication. As used herein,the term “metadata” means information taken from a communication orassociated with a communication that describes the communication. Forexample, metadata can include the communication's start time; stop time;size; protocols used; computers, entities, and resources involved;routing information; aliases of the computers, entities, and resources;properties of communication; and other information useful to a person orcomputer analyzing the communication. Common language descriptions ofthe metadata describing a communication often requires less than onepercent of the storage space as the communication itself.

In another aspect of the invention, the common language is in the formof an event-based language that permits description of a communicationin terms of its sessions, events, and properties.

In another aspect of the invention, protocol-specific data is parsedinto an event-based language based on the nature of the transactionincluded within the data.

The present invention can be used in a variety of contexts, includingtransactions in a computer network, transactions in an application ordevice log file, transactions found on computer media, transactions inbadge detectors, transactions generated by motion detectors,transactions generated in connection with phone calls, transactionsgenerated in connection with credit card transactions, and other systemsin which transactions occur according to one or more protocols.Generally, systems with communications using multiple protocols,formats, and/or application types can benefit from the invention.

Additional features and advantages of the present invention will be setforth in the description which follows, and in part will be apparentfrom the description, or may be learned by practice of the invention.The objectives and advantages of the invention will be realized andattained by the structure and steps particularly pointed out in thewritten description, the claims and the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a system for analyzing network trafficin accordance with an embodiment of the present invention.

FIG. 2 is a schematic diagram illustrating the parser aspect of thepresent invention in greater detail.

FIG. 3 is a flow diagram of a method for analyzing data packets inaccordance with an embodiment of the present invention.

FIG. 4 is a flow diagram of a method for analyzing session data inaccordance with an embodiment of the present invention.

FIG. 5 is a schematic diagram of an event-based language in accordancewith an embodiment of the present invention.

FIG. 6 is a flow diagram of a method for generating an event-basedlanguage from data packets in accordance with an embodiment of thepresent invention.

FIG. 7 illustrates an exemplary generation of an event-based languagecorresponding to an email session in accordance with the presentinvention.

FIG. 8 illustrates an exemplary generation of an event-based languagecorresponding to a file transfer session in accordance with the presentinvention.

FIG. 9 a illustrates an exemplary generation and form of an event-basedlanguage in accordance with the present invention.

FIG. 9 b illustrates an exemplary generation and form of an event-basedlanguage in accordance with the present invention.

FIGS. 9 c and 9 d illustrate two exemplary generations of an event-basedlanguage in accordance with the present invention.

FIG. 10 illustrates an exemplary data conformed to an HTTP protocol inaccordance with the present invention.

FIG. 11 a illustrates an exemplary data conformed to an SMTP protocol inaccordance with the present invention.

FIG. 11 b illustrates an exemplary data conformed to an FTP protocol inaccordance with the present invention.

FIG. 12 a illustrates an exemplary generation of an event-based languagein accordance with the present invention.

FIG. 12 b illustrates an exemplary form of an event-based language inaccordance with the present invention.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a schematic diagram of a system for analyzing network trafficin accordance with an embodiment of the present invention. Generally,the embodiment of the present invention shown in FIG. 1 is a systemconfigured to translate network communications or input files containingnetwork communications into a common language for analysis.Specifically, this embodiment includes a system configured to inputpackets associated with communications across a network, assemble thosepackets into sessions, direct the sessions to appropriate parsers, parsethe sessions into session in a common language, and communicate thesecommon-language sessions to an analyzer.

For example, a protocol-specific parser in accordance with the presentinvention can convert protocol-specific data at any network level into acommon language. The common language can be used to describe networklayer communications including, for example: Ethernet, Token Ring,TCP/IP, IPX/SPX, AppleTalk™, IPv6, and other network layer protocols.The common language also can be used to describe application layercommunications including, for example: SMTP, HTTP, TELNET, FTP, POP3,RIP, RPC, Lotus Notes™, TDS, TNS, IRC, DNS, SMB, RIP, NFS, DHCP, NNTP,instant messengers (AOL IM™, MSN, YAHOO™) and other application layerprotocols. The common language can also be used to describe the contentof communications including, for example: E-Mail messages, PGP, S/MIME,V-Card, HTML, images, and other content types.

In FIG. 1, a network 102 represents any network whereby communicationbetween two or more entities may be made or monitored. Network 102 maybe a simple network, for example, a cable connecting two computers, suchas a computer 122 and a computer 124. Network 102 may be a complexnetwork as well, such as representing a network configured to pass,allow passage of, or monitoring of communications between computers,servers, wireless computers, satellites, or other communication devices.For example, network 102 may represent intranets, extranets, and globalnetworks including the Internet. For clarity in explaining but not tolimit the function of network 102, FIG. 1 sets forth a limited number ofcommunication devices communicating through or monitored by network 102:computer 122; computer 124; a server 126; and a wireless computer 128.

Typically, communications between entities across or monitored bynetwork 102 are made in pieces, rather than as a complete transfer. Insuch cases, a complete communication between two entities is broken intomultiple pieces, or “packets,” of data. Such packets conform to one ormore protocols. As used herein, the terms “protocol or protocols,”depending on the context, refers to network protocols such as TCP/IP,IPX/SPX, or AppleTalk™, as well as application protocols, such as FTP,SMTP, HTTP, and so forth. In other words, the terms “protocol orprotocols,” unless the context establishes a particular protocol, isintended to include any protocol in which data may be represented ortransferred in any communication system.

A packet handler 104 is configured to monitor the many packets of datain network 102. For example, packet handler 104 can be a sniffer, suchas EtherPeek™ available from WildPackets, Inc. In doing so, packethandler 104 is also configured to copy the packets in network 102.Packet handler 104 is also configured to send the packets to anassembler 106. Alternatively, assembler 106 may be configured to accessthe copied packets from packet handler 104. Packet handler 104 may alsobe configured to send the packets in real-time to an assembler 106without recording the packets. In any event, assembler 106 is configuredto receive the packets of data representing communications in network102. Packet handlers and assemblers may, in a preferred embodiment ofthe invention, be configured as set forth in copending U.S. patentapplication Ser. No. 09/552,878, filed Apr. 20, 2000, claiming thebenefit of U.S. Provisional Application No. 60/131,904, filed Apr. 30,1999, which is incorporated herein by reference in its entirety.

Assembler 106 is also configured to assemble the packets into thecommunication that the packets represent. Such communications arepreferably assembled into sessions. Each session represents acommunication between two or more entities. In an exemplary embodimentof the present invention, assembler 106 is configured to assemble thepackets into a set of sessions 110. For example, the set of sessions 110can include sessions 110 a, 110 b, 110 c, and 110 d. Sessions 110 a, 110b, 110 c, and 110 d can conform to the same protocol, or conform todifferent protocols. For example, one of the sessions, session 110 bconforms to the well-known HTTP application protocol.

Sessions can also be generated by other session sources 108. Othersession sources 108 can generate sessions that conform to a specificapplication type or protocol. These sources typically do not require theassembler 106 to reconstruct the network packets into a session. Asshown in FIG. 1, for example, other session sources 108 may generate asession 110 e. Session 110 e conforms to a protocol, which may be, butneed not be, the same as the protocol associated with one of thesessions of set of sessions 110.

Sessions generated by assembler 106 or other session source, such asother session source 108, are transmitted (or input) to a parserdirector 112. Parser director 112 is configured to accept sessionsgenerated by assembler 106 or other session source 108. Parser director112 directs each session to one of a set of protocol-specific parsers116 corresponding to the protocol of the session. Each protocol-specificparser in the set of protocol-specific parsers 116 is configured toreceive sessions corresponding to that particular protocol. For example,protocol-specific parser 116 a is configured to receive sessionsconforming to the File Transfer Protocol (FTP). Protocol-specific parser116 b is configured to receive sessions conforming to the Telnetprotocol. Protocol-specific parser 116 c is configured to receivesessions conforming to the HTTP protocol. Protocol-specific parser 116 dis configured to receive sessions conforming to MS instance messagingprotocol. Protocol-specific parser 116 e is configured to receivesessions conforming to the Network News Transfer Protocol (NNTP).Protocol-specific parser 116 f is configured to receive sessionsconforming to the Simple Mail Transfer Protocol (SMTP). For example,directed session 114 c (related to session 110 b) is directed toprotocol-specific parser 116 c because protocol-specific parser 116 c isconfigured as an HTTP parser. As described in detail below, eachprotocol-specific parser is configured to produce a common languagerepresentation of each session that is input to it.

An analyzer 120 communicates with the output of any of the set ofprotocol-specific parsers 116. That is, analyzer 120 is configured tocommunicate with protocol-specific parsers 116 using the common languagegenerated by each of the set of protocol-specific parsers 116. Thus,analyzer 120 can communicate with any of the protocol-specific parsers116 regardless of the protocol of the sessions they are configured tohandle. Consequently, using the common language output ofprotocol-specific parsers 116 eliminates the need to have a plurality ofparsers corresponding to each of the protocols as required inconventional network analysis systems.

FIG. 2 is a schematic diagram illustrating the parser aspect of thepresent invention in greater detail. Directed sessions 114 are thesessions output by parser director 112 according to the protocol(s) ofthe sessions. Directed sessions 114 are directed to a set ofprotocol-specific parsers 116.

As shown in FIG. 2, directed sessions 114 generally conform to disparateprotocols. For example, in the embodiment illustrated in FIG. 2, sixsessions having different protocols are shown. The six protocols areFTP, Telnet, HTTP, MS Instant Messaging, NNTP, and SMTP. It would beapparent to those skilled in the art that the illustrated protocols areby way of example only. Any set of protocols could be represented. Eachdirected session output by parser director 112 is input to aprotocol-specific parser configured to process the protocol associatedwith that session. For example, as illustrated in FIG. 2, FTP session114 a is input to an FTP-specific parser 116 a. Telnet session 114 b isinput to Telnet-specific parser 116 b. HTTP session 114 c is input toHTTP-specific parser 116 c. MS Instant Messaging session 114 d is inputto MS Instant Messaging-specific parser 116 d. NNTP session 114 e isinput to NNTP-specific parser 116 e. SMTP session 114 f is input toSMTP-specific parser 116 f.

Protocol-specific parsers 116 process their input in order to outputdata conformed to a protocol-independent common language. As usedherein, the term “common language” means a language that can be used torepresent network traffic conformed from multiple, disparate protocols.The content expressed in the form of the common language may be referredto herein as “metadata.” In an exemplary embodiment, the common languageis an event-based language (described in greater detail below). Forexample, FTP-specific parser 116 a outputs sessions in a common language118 a. Telnet-specific parser 116 b outputs session in a common language118 b. HTTP-specific parser 116 c outputs session in a common language118 c. MS Instant Messaging-specific parser 116 d outputs session in acommon language 118 d. NNTP-specific parser 116 e outputs session in acommon language 118 e. SMTP-specific parser 116 f outputs session in acommon language 118 f.

FIG. 3 is a flow diagram of an embodiment of a method for analyzingnetwork traffic in accordance with the present invention. Generally,this method is practiced by a system that collects, assembles, andparses data conformed to multiple protocols into data conformed to acommon language. As would be known to those skilled in the art, manydifferent elements, configurations, or combination of elements can beused to implement the methods described below. For clarity, however, thebelow description of preferred methods of the invention uses many of theelements described in FIGS. 1 and 2. Moreover, the following describesan embodiment in which a single packet collector 1402 is operating.However, aspects of the instant invention may also be implemented usingmultiple packet collectors and at least one aggregator, as will bedescribed in more detail later herein.

In step 302, packet handler 104 collects packets from network 102.Preferably, as part of collecting packets in step 302, packet handler104 monitors communications comprising packets across network 102. Inone embodiment of the present invention, packet handler 104 collectspackets by copying them from the monitored communications across network102. The collected packets can be stored in a file (not shown).

In step 304, packet handler 104 makes the collected packets available toassembler 106. Packet handler 104 can make the packets available toassembler 106 by storing the packets in a file that assembler 106 canaccess. In another exemplary embodiment, packet handler 104 makes thepackets available to assembler 106 in real-time without recording thepackets. In each of these embodiments, as part of step 304, assembler106 receives the collected packets.

In step 306, assembler 106 assembles the packets into sessions. Thesesessions preferably consist of packets of the same network protocol andpreferably the same source/target addresses found in each network layer.In step 308, assembler 106 communicates the sessions, which conform toone or more protocols to parser director 112. Alternatively, parserdirector 112 may actively capture sessions 110 from assembler 106.

In step 310, parser director 112 directs assembled sessions toprotocol-specific parsers 116. In an exemplary embodiment, parserdirector 112 performs protocol matching and lexical analysis of thesession content to decide to which protocol-specific parsers 116 todirect each assembled session.

In step 312, protocol-specific parsers 116 receive directed sessions 114from parser director 112. In step 314, protocol-specific parsers 116output the parsed sessions in the common language. As described above,each of protocol-specific parsers 116 operates on sessions that conformto the protocol to which the parser is configured to parse. If there ismore than one protocol present in the session data presented to parserdirector 112, preferably there will be a protocol-specific parser foreach protocol present in the session data. The protocol-specific parsersoutput a common language representation of the session data input tothem. Preferably, the protocol-specific parsers parse metadatarepresentative of the session data. Also preferably, the metadataconforms to the common language.

In step 316, protocol-specific parsers 116 submit the common languagedata to an analyzer. Protocol-specific parsers 116 can also recordcommon language data to a record (or log). Also as part of step 316,protocol-specific parsers 116 or analyzer 120 may access the commonlanguage data from the record. If protocol-specific parsers 116 accessthe common language data from the record, protocol-specific parsers 116then communicate the common language data to analyzer 120.

In step 318, analyzer 120 analyzes data conformed to the commonlanguage. Preferably, only one analyzer 120 is used to analyze all ofthe common language data. In an exemplary embodiment, only one analyzerusing one analysis logic is needed to analyze the communicationsrepresented by the sessions because the communications are conformed tothe common language rather than disparate protocols. In an exemplaryembodiment, analyzer 120 is a workstation-based system having agraphical user interface (GUI) for formulating queries and performingother analyses on the database. In another exemplary embodiment,analysis tools, such as those included in analyzer 120, do not have tobe changed when protocols are added or changed because protocol-specificparsers 116 can be modified or added to the system. Sessions parsed intometadata in the common language are described in an exemplary embodimentas common language data in FIGS. 1 and 2 and as common-language sessionsor sessions in common language herein.

FIG. 4 is a flow diagram of another embodiment of a method for analyzingnetwork communications in accordance with the present invention.Generally, the method comprises steps for parsing information fromsessions conforming to one or more protocols into metadata conforming toa common language. Many different elements, configurations, orcombinations of elements can be used to implement the methods describedbelow. For clarity, however, the below description of preferred methodsof the invention uses many of the elements set forth in FIGS. 1 and 2.

In step 402, protocol-specific parsers 116 receive directed sessions114. Each parser of protocol-specific parsers 116 receives only directedsessions 114 that conform, at least in part, with the protocol to whichthe receiving protocol-specific parser is configured to parse. Forexample, parser 116 b is configured to parse sessions conformed to theTelnet protocol. Thus, parser 116 b receives any session that, in part,conforms with the Telnet protocol (see FIG. 2).

In step 404, protocol-specific parsers 116 extract information fromdirected sessions 114. If desired, the extracted information can bestored in step 405. In step 406, protocol-specific parsers 116 translatethe extracted information into a common language. For example,Telnet-specific parser 116 b extracts session data conforming to theTelnet protocol and translates that data into the common language.

Preferably, in step 404, protocol-specific parsers 116 carefully extractonly information generally useful in analyzing the communication(s) thateach session represents. By extracting only a portion of theinformation, this embodiment of the present invention creates a commonlanguage 118 representation of the session data that is significantlysmaller than directed sessions 114 or sessions 110. Consequently, theserepresentations are cheaper and more efficient to store. Moreover, thecommon language data is more quickly and easily analyzed due to itssignificantly smaller size.

In step 408, protocol-specific parsers 116 communicate sessions incommon language 118. If the common language data is not to be stored ina database, as determined in step 410, protocol-specific parsers 116 maycommunicate each session of the sessions in common language 118one-at-a-time or in groups to analyzer 120. In step 412, analyzer 120analyzes sessions in common language 118. In this exemplary embodiment,only one analyzer 120 is used to analyze all of the sessions in commonlanguage 118. Alternatively, if the common language data is to be storedin a database, one or more database records for storing the commonlanguage data is created in step 414. The database can be later accessedby an analyzer such as analyzer 120 to analyze the data.

FIG. 5 is a schematic diagram of another embodiment of a system foranalyzing network traffic in accordance with the present invention.Generally, this embodiment shows an exemplary embodiment of a commonlanguage, called an event-based language, to which networkcommunications or input files containing communications are translatedin preparation for analysis.

Preferably, event-based language 502 follows a taxonomy of session 504,events 506, and properties 508. In an exemplary embodiment, event-basedlanguage 502 further comprises aliases 510 and routes 512. According tothe sessions-events-properties taxonomy, each session corresponds to oneor more network events. In one embodiment, sessions may be used to groupevents per computer per application. For example, a computer incommunication with a server using a Netscape browser can be one session;the server response to the computer can be another session. Sessions canbe used to group events in other fashions, for example, in order toaccommodate so-called “portjumping” protocols. In another embodiment,sessions can encompass other sessions in a directory-type systemstructure.

Events 506 can be described in terms of entities 514 involved in eachevent of events 506. Generally, each event of events 506 corresponds toa communication between at least two entities 514. Each event of events506 can also be described in terms of various properties 508 associatedit. In an exemplary embodiment, each event of events 506 can also bedescribed in terms of aliases 510 of entities 514 for each event, androutes 512 associated with each event. In an exemplary embodiment,aliases 510 of entities 512 can be recorded as a property to each entity(not shown in FIG. 5) and routes 512 can be recorded as indirect eventsto session 504.

In an exemplary embodiment, each session (e.g., network transaction orother communication) can be converted to a standard set of outputs. Forexample, there may be two basic outputs provided by a protocol-specificparser, such as one of protocol-specific parsers 116: events 506 andproperties 508. Thus, the metadata describing sessions involving avariety of protocols can be stored in as little as two basic tables.This is a significant benefit of the present invention in comparison toprior approaches. For this exemplary embodiment, the metadata conformingto the event-based language can be stored in a log or record having aslittle as two columns.

FIG. 5 illustrates an exemplary structure of the event-based language asapplied to transactions in a computer network. Preferably, eachtransaction will be grouped in a single session 504 and can be describedin terms of one or more of: events 506, properties 508, aliases 510, androutes 512. In the embodiment set forth in FIG. 5, an entity of entities514 can be one of three types: a computer 522, a user 520, or a resource524. For example, an entity that is computer 522 could be a host, aserver, a desktop, a laptop, and so forth. Computer 522 could beidentified by a network address, a computer name, a host name, a portnumber, and so forth. Computer 522 can be a computer that is withinnetwork 102 (FIG. 1) or another network that is being accessed or onethat is outside of either network 102 or the other network.

User 520 can be an individual, such as an authorized user on a computernetwork. User 520 may be an e-mail address, a local area network (LAN)user, the “Full Name” (real name) of the user, a handle or name used toidentify user 520, and so forth.

Resource 524 may be a resource that is accessed or used during an event.For example, resource 524 may be a file, data from within a database, ora message from a shared bulletin board. Resource 524 can also be acontainer of other resources, such as a file system directory structure,a database, tables in a database, or a shared bulletin board. Examplesof entity types, such as resource 524, computer 522, and user 520, andcorresponding numerical representations are:

-   -   100, “IP”;    -   101, “IP-PORT”;    -   102, “IP-USER”;    -   103, “IP-RESOURCE”;    -   200, “HOST”;    -   201, “HOST-PORT”;    -   202, “HOST-USER”;    -   203, “HOST-RESOURCE”; and    -   300, “GROUP.”

In the exemplary embodiment set forth in FIG. 5, the common language isrepresented by an event-based language. The event-based language permitsevents on a computer network to be described using so-called eventstatements. For example, an event can refer to transactions between orinvolving differing types of entities, such as the followinginteractions between entities: computer→computer; user→computer,user→user, users→resource, and so forth.

An event statement 526 describes an action taken by one entity withrespect to at least one other entity using a service. Thus, each eventstatement 526 preferably comprises two parameters: (1) one or moreentities 514; and (2) an action 516.

A session statement 534 describes a session. As such, each sessionstatement 534 includes some facts about session 504. In an exemplaryembodiment, session statement 534 includes the times that session 504began/ended, the size of session 504 (e.g., 1.5 MB), and a service type518 of the session. Generally, service types (sometimes referred toherein as “services” or “applications”) refers to or is related to aprotocol or application used during network communications. A propertystatement 528 preferably includes facts about either session 504 orevent 506. In an exemplary embodiment where event 506 includes an emailcommunication, property statement 528 can include the subject line ofthe email communication. A route statement 532 preferably includes factsabout the route that an event traveled. An alias statement 530preferably includes information regarding the identity of user 520,computer 522, or resource 524.

Examples of actions that might be logged into a record using theevent-based language for network level communications include: anETHERNET transaction, an IP transaction, or a TCP transaction. Examplesof actions that might be logged into a record at the application level:a “user login” (a user attempting or obtaining access to a system) a“user logoff,” a “get resource” (e.g., getting or acquiring a resource,such as downloading a file or selecting a database row), a “putresource” (e.g., performing an operation using a resource, such assaving a file, uploading a file, or inserting a database row), a “deleteresource” (e.g., removing a resource, such as deleting a file ordatabase row), a “send message” (e.g., sending an e-mail or sending anInstant Message), a “receive message” (e.g., receiving an e-mail orreceiving an Instant Message), a “read message” (e.g., opening an e-mailor opening an Instant Message to read it), a “database query request”(e.g., a client issuing a request from a database), and a “databasequery response” (e.g., a server providing a response to the client'srequest). Examples of actions that can be logged into a record in anexemplary system and corresponding numerical representations are:

-   -   1, “IP Transaction”;    -   10, “User Login”;    -   11, “User Logoff”;    -   20, “Get Resource”;    -   21, “Put Resource”;    -   22, “Delete Resource”;    -   30, “Send MSG”;    -   31, “Receive MSG”;    -   32, “Read MSG”;    -   33, “Delete MSG”;    -   40, “Database Query”;    -   110, “User Login Response”;    -   111, “User Logoff Response”;    -   120, “Get Resource Response”;    -   121, “Put Resource Response”;    -   122, “Delete Resource Response”;    -   130, “Send MSG Response”;    -   131, “Receive MSG Response”;    -   132, “Read MSG Response”; and    -   140, “Database Query Response.”

Other values for actions can be used in order to tailor the commonlanguage to a particular computer network or to accommodate newapplications. Generally, the library of actions is sufficient todescribe actions, such as action 516, taken in connection with acommunication between two entities, such as entities 514.

Examples of services that might be logged into a record using the commonlanguage include: File Transfer Protocol (FTP), TELNET, Simple MailTransfer Protocol (SMTP), Domain Name Service (DNS), Hypertext TransferProtocol (HTTP), POP3, Network News Transfer Protocol (NNTP), ServerMessage Block (SMB), MSSQL.™./Sybase™ Database protocol (e.g., TDS),Oracle™ Database Protocol (e.g., TNS), Lotus Notes™, Dynamic HostConfiguration Protocol (DHCP), Remote Procedure Call (RPC), RoutingInformation Protocol (RIP), Network File System (NFS), and InstantMessenger Protocols (AOL™, MSN, Yahoo™, etc.). Examples of services thatcan be logged into a record in an exemplary system and correspondingnumerical representations are:

-   -   21, “Ftp”;    -   23, “Telnet”;    -   25, “E-Mail (SMTP);    -   53, “Domain Name Service”;    -   67, “DHCP”;    -   5190, “AOL™ Instant Msg”;    -   5050, “Yahoo™ Instant Msg”;    -   80, “WWW”;    -   109, “E-Mail (POP-2)”;    -   110, “E-Mail (POP-3)”;    -   119, “News”;    -   135, “Microsoft RPC”;    -   137, “Netbios™”;    -   139, “MS File Access”;    -   161, “SNMP”;    -   520, “RIP”;    -   1122, “MS Instant Msg”;    -   1352, “Lotus Notes™”;    -   1362, “Sybase™ Database”;    -   1433, “MSSQL™ Database”;    -   1521, “Oracle™ Database”;    -   1533, “Lotus Sametime™”;    -   2049, “Unix™ File Access”; and    -   6667, “IRC.”

Other values for services can be used in order to tailor the event-basedlanguage to accommodate new applications and protocols.

Using the two parameters (entities 514 and action 516), event statement526 can be expressed in the form: <ENTITY1> was seen <ACTION> to<ENTITY2>. In an exemplary embodiment, event statement 526 can alsoinclude service type 518, as shown in FIG. 9 a. As shown in FIG. 9 a,the expression of event statement 526 is of the form: <ENTITY1> was seen<ACTION> to <ENTITY2> with <SERVICE TYPE> for an event of events 506involving two entities of entities 514, one at the “source” end and oneat the “target” end. For an event involving multiple entities ofentities 514 at each end, event statement 526 can be expressed as:<ENTITY1A, ENTITY1B> was seen <ACTION> to <ENTITY 2A, ENTITY2B> with<SERVICE TYPE>, also as shown in FIG. 9 a.

For example, event 506 for a first user (TODD) of entities 514 sendingan e-mail to a second user (DAMON) of entities 514 can be expressed byevent statement 526 conformed to the following form: <USER TODD> wasseen <SENDING MESSAGE> to <USER DAMON> with <SMTP>, as shown in FIG. 9a.

Also for example, event 506 for a user (TODD) of entities 514 using afirst computer to receive via File Transfer Protocol (FTP) a filecontaining a password stored on a second computer can be expressed byevent statement 526 conformed to the following form: <COMPUTER192.168.1.2, USER TODD> was seen <GETTING RESOURCE> from <COMPUTER192.168.1.1, RESOURCE:/etc/passwd> using <FTP>, as shown in FIG. 9 a.

Protocol-specific parsers 116 (FIGS. 1 and 2) do not have to outputevents in the format of event statement 526. Preferably, however,protocol-specific parsers 116 extract and output three parameters thatcan form event statement 526: entities, action, and service type. Thesebasic parameters can be stored and, if desired, displayed in eventstatement format for a readily comprehended metadata description of theevent, or in some other format.

Each event 506 may also have properties associated with the event. Forexample, event 506 corresponding to an e-mail (e.g., referring to theaction types listed above, the action type “SEND_MSG” and the service“E-mail (SMTP)”) may have associated properties. For example, theproperties for such an e-mail may include the subject line of the e-mail(“IMPORTANT INFORMATION, PLEASE READ”), the sender password (“test12”),and the application used for the action (“Outlook Express”). FIG. 9 billustrates an exemplary property name-value pair for storing propertiesassociated with an event. FIG. 9 b shows three name fields: “subject,”“password,” and “application.” FIG. 9 b shows three values for thosename fields: “IMPORTANT INFORMATION, PLEASE READ”, “test12”, and“Outlook Express”. Other property types or fields could be included,such as the size of the event, the time of the event, file attachments,full names of the sender and all recipients, and so forth.

Each event, such as event 506, may also have associated routes, such asroute 512. Route 512 refers to network communication information thatmay be carried within captured data, but that was not directly observedin collecting the data. For example, a collected e-mail may include alist or log of the servers through which the e-mail message passed. Thisinternal routing information, while not directly observed, can beextracted and stored. FIG. 9 c illustrates an exemplary format forcapturing the routing information. The exemplary format is a <COMPUTERENTITY> to <COMPUTER ENTITY> format. Event 506 may have multiple routes512 corresponding to multiple route statements, each like the one shownin FIG. 9 c.

Each event, such as event 506, may also have associated aliases, such asalias 510. Aliases 510 are names or values for an entity (e.g., acomputer or a user) that describe the same entity. For example, event506 may involve a computer entity, such as computer 522, defined by theIP address “192.168.1.12.” Event 506 may also involve a user entity,such as user 520, defined by the e-mail address“todd@forensicsexplorers.com.” Computer 522 may be correlated to thealias “forensicsexplorer.com” and user 520 may be correlated to thealias “Todd Moore.” FIG. 9 d illustrates an exemplary storage format forstoring alias information for events. Therefore, the present inventionprovides that when event 506 is extracted the observed entities 514 canbe correlated to known aliases 510. This information can be stored andassociated with event 506 for later review and/or processing.

To create event statements or otherwise generate metadata, the inventionparses information from each session or other communication data. In anexemplary embodiment, using for purpose of clarity the elements of FIGS.1 and 2, the invention parses information following the method set forthin FIG. 6.

FIG. 6 provides a flow diagram for an exemplary method for convertingsessions into the event-based language. As described above, theevent-based language is one example of a common language according tothe present invention. In an exemplary embodiment intending to reducethe number of tables in a metadata log, the step of identifying eventroutes may comprise treating an identified route as an “indirect event.”In this embodiment, the step of identifying aliases may comprisetreating an identified alias as a property of an entity. This mightpermit storing routes in an event table and aliases in the propertiestable. By treating routes and aliases under the rubric of events andproperties, respectively, the number of tables required for a log orfile of the sessions can be reduced.

In the exemplary embodiment set forth in FIG. 6, assembler 106 (FIG. 1)receives packets in step 602. The packets are assembled into sessions instep 604. Protocol-specific parsers 116 (in this case one parser foreach protocol in the session), extract session properties in step 606.Protocol-specific parsers 116 then identify events in step 608, identifyroutes in step 610, identify entities in step 612, identify entityaliases in step 614, identify actions in step 616, and extract eventproperties in step 618, from within the session. Protocol-specificparsers 116 continue to parse the session until all events within thesession have been parsed in step 620. Protocol-specific parsers 116parse other sessions, according to step 620 and so forth.

The method illustrated in FIG. 6 presumes that the service type will bethe same for all events in a session. Accordingly, the service isextracted as a property of the session. Alternatively, the service typecan be identified for each event. In that case, the method performs thestep of identifying a service type in the session in step 617.

FIG. 7 illustrates an example of the present invention to parse an SMTP(Simple Mail Transfer Protocol) session into the event-based language.In FIG. 7, the area “A” displays data from the session in protocol,which consists of multiple data packets for an e-mail that was sent fromone user to another. The session includes network-level data (e.g.,Ethernet and TCP/IP) and application data (e.g., SMTP and MicrosoftOutlook).

Area “B” displays the metadata that describes the session according tothe event-based language. The overall SMTP session is described by fourproperties: time, size, service, and subject (not shown). The sessionincludes three separate events: (1) a first event between the sourcecomputer (entity) and the target computer (entity) for an IP transaction(action); (2) a second event between the port (entity) of the sourcecomputer and the port (entity) of the target computer for a TCPtransaction (action); and (3) a third event between the source user(entity) and the target user (entity) for sending a message (action).The service type (SMTP) is not separately recited for each of the eventsbecause it is the same for all events in the session.

Properties of the third event are also identified. The propertiesinclude the identity of the application (MS Outlook) and the attachedfile (winmail.dat).

FIG. 8 illustrates an example of applying the present invention to parsean FTP (File Transfer Protocol) session into the event-based language.In the session of FIG. 8, a user has logged into a site, stored a file,retrieved some data, and then deleted the file. In area “A” of FIG. 8,network-level data and application data from the packets and within thesession are shown. By application of the invention, the session istranslated into metadata conformed to the event-based language shown inarea “B.”

FIGS. 7 and 8 provide an exemplary illustration of the benefits of theinvention. The protocol-specific data in area A for both figures iscomplex and unwieldy. More importantly, the extracted data for the SMTPsession (shown in FIG. 7) is very different from the extracted data forthe FTP session (shown in FIG. 8). Additionally, the extracted data(area A) is not readily or easily understood in terms of the events thattook place. Without the present invention, logs of SMTP sessions and FTPsessions would require separate analysis tools to be analyzed.

When a session is converted to metadata conforming to the event-basedlanguage (as shown in areas B of FIGS. 7 and 8), the network-levelevents are readily understood. The metadata for different protocols(here, SMTP and FTP) can be stored in the same finite set of tables in alog or record. Importantly, the same analysis tool or tools can be usedto analyze both types of sessions.

FIGS. 10, 11 a, and 11 b provide a record of an exemplary embodiment ofdata from protocol-specific sessions. FIG. 10 illustrates data from asession conforming to the HTTP protocol. FIG. 11 a illustrates data froma session conforming to the SMTP protocol. FIG. 11 b illustrates datafrom a session conforming to the FTP protocol.

FIG. 12 a illustrates a log output file of the three sessionsillustrated in part in FIGS. 10, 11 a, and 11 b after they have beenparsed into metadata conformed to the event-based language of thepresent invention. The metadata for the first session is represented inthe first seven lines of the exemplary log output file. The metadata forthe second session is represented in lines eight to eighteen of theexemplary log output file. The metadata for the third session isrepresented in lines nineteen to twenty-three of the exemplary logoutput file. This output follows the form shown in FIG. 12 b.

In FIG. 12 b, the terms shown after the “S:” relate to types of metadataabout a session of data from which an event is a part. The terms shownafter the first two “P:” relate to metadata about properties of thesession of data. The terms shown after the “E:” relate to types ofmetadata about the event. The terms shown after the “P:” below the “E:”relate to types of metadata about properties of the event. For example,“<source name: subname>” and “<target name:subname>” are entitiesinvolved in event. The terms shown after the “A:” relate to types ofmetadata about an alias or aliases of these entities. The terms afterthe “R:” relate to types of metadata about the route or routes taken bythe session of data or the data packets that comprise the session. Ascan be readily seen, the output of this exemplary embodiment of theinvention shows parsing of sessions in disparate protocols into acompact output conforming to a common language.

The foregoing disclosure of the preferred embodiments of the presentinvention has been presented for purposes of illustration anddescription. It is not intended to be exhaustive or to limit theinvention to the precise forms disclosed. Many variations andmodifications of the embodiments described herein will be obvious to oneof ordinary skill in the art in light of the above disclosure.

Further, in describing representative embodiments of the presentinvention, the specification may have presented the method and/orprocess of the present invention as a particular sequence of steps.However, to the extent that the method or process does not rely on theparticular order of steps set forth herein, the method or process shouldnot be limited to the particular sequence of steps described. As one ofordinary skill in the art would appreciate, other sequences of steps maybe possible. Therefore, the particular order of the steps set forth inthe specification should not be construed as limitations of theinvention.

1. A method of analyzing network traffic, comprising: obtaining packetstraversing an electronic data network; sending the packets to anassembler and assembling the packets into at least one session, whereinthe at least one session comprises a network transaction between a firstentity and a second entity; translating the at least one session intometadata that conforms to an event-based language; and creating arecord, based on the metadata, containing a session statement thatincludes a property statement regarding the session, and an eventstatement that includes a property statement regarding the event.
 2. Themethod of claim 1, wherein the step of obtaining comprises copyingpackets from the electronic data network.
 3. The method of claim 1,further comprising parsing the packets based on protocol.
 4. The methodof claim 1, wherein the first and second entity comprise a computer, auser, or a resource.
 5. The method of claim 1, wherein the propertystatement regarding the session comprises at least one of time, size, orservice type.
 6. The method of claim 1, wherein the property statementregarding the event identifies an action, and the action comprises atleast one of the following action types: User Login, User Logoff, GetResource, Put Resource, Delete Resource, Send Message, Receive Message,Read Message, Delete Message, Database Query, User Login Response, UserLogoff Response, Get Resource Response, Delete Resource Response, SendMessage Response, Read Message Response, or Database Query Response. 7.The method of claim 6, further comprising presenting the metadata inaccordance with the following structure: “<the first entity> was seen<the action> to <the second entity> with <the application>.”
 8. Themethod of claim 1, wherein property statement regarding the eventidentifies an application and the application is one of the followingapplication types: FTP, Telnet, SMTP, Domain Name Service, DHCP, AOL™Instant Messenger, Yahoo™ Instant Messenger, HTTP, POP-2, POP-3, NNTP,Microsoft RPC, Netbios, MS File Access, SNMP, RIP, MS Instant Messenger,Lotus Notes™, Sybase™ Database, MSSQL™ Database, Oracle™ Database, LotusSametime™, Unix™ File Access, or IRC.
 9. The method of claim 1, whereinthe event statement identifies one of the following content types: Mail,HTML, DCARD, SMIME, or PGP.
 10. The method of claim 1, furthercomprising generating a route statement describing a route through thenetwork traveled by the session, wherein the record also contains theroute statement.
 11. The method of claim 1, further comprisinggenerating an alias statement describing information related to anidentity of the first entity or the second entity, wherein the recordalso contains the alias statement.
 12. The method of claim 11, whereinthe alias statement contains at least one of the following alias types:IP-Alias or User-Alias.
 13. The method of claim 1, wherein the metadatacomprises a global property name.
 14. A method of analyzing networktraffic, comprising: obtaining packets traversing an electronic datanetwork; sending the packets to an assembler and assembling the packetsinto at least one session, wherein the at least one session comprises anetwork transaction between a first entity and a second entity;translating the at least one session into metadata that conforms to anevent-based language; and based on the metadata, generating a sessionstatement that includes a property statement regarding the session, andan event statement that includes a property statement regarding theevent.
 15. The method of claim 14, wherein the property statementregarding the session comprises at least one of time, size, or servicetype.
 16. The method of claim 14, wherein the property statementregarding the event identifies an action and the action comprises atleast one of the following action types: User Login, User Logoff, GetResource, Put Resource, Delete Resource, Send Message, Receive Message,Read Message, Delete Message, Database Query, User Login Response, UserLogoff Response, Get Resource Response, Delete Resource Response, SendMessage Response, Read Message Response, or Database Query Response. 17.The method of claim 16, further comprising presenting the metadata inaccordance with the following structure: “<the first entity> was seen<the action> to <the second entity> with <the application>.”
 18. Themethod of claim 14, wherein property statement regarding the eventidentifies an application and the application is one of the followingapplication types: FTP, Telnet, SMTP, Domain Name Service, DHCP, AOL™Instant Messenger, Yahoo™ Instant Messenger, HTTP, POP-2, POP-3, NNTP,Microsoft RPC, Netbios, MS File Access, SNMP, RIP, MS Instant Messenger,Lotus Notes™, Sybase™ Database, MSSQL™ Database, Oracle™ Database, LotusSametime™, Unix™ File Access, or IRC.
 19. The method of claim 14,wherein the metadata comprises a global property name.
 20. A method ofanalyzing network traffic, comprising: receiving electronic datarepresentative of an electronic data network session that has takenplace between a first entity and a second; translating the session intometadata that conforms to an event-based language; and creating arecord, based on the metadata, containing a session statement thatincludes a property statement regarding the session, and an eventstatement that includes a property statement regarding the event.